Variable Selection in Large Environmental Data Sets Using Principal Components Analysis

نویسندگان

  • JACQUELYNNE R. KING
  • DONALD A. JACKSON
چکیده

In many large environmental datasets redundant variables can be discarded without the loss of extra variation. Principal components analysis can be used to select those variables that contain the most information. Using an environmental dataset consisting of 36 meteorological variables spanning 37 years, four methods of variable selection are examined along with di€erent criteria levels for deciding on the number of variables to retain. Procrustes analysis, a measure of similarity and bivariate plots are used to assess the success of the alternative variable selection methods and criteria levels in extracting representative variables. The Broken-stick model is a consistent approach to choosing signi®cant principal components and is chosen here as the more suitable criterion in combination with a selection method that requires one principal component analysis and retains variables by starting with selection from the ®rst component. Copyright # 1999 John Wiley & Sons, Ltd.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Persian Handwriting Analysis Using Functional Principal Components

Principal components analysis is a well-known statistical method in dealing with large dependent data sets. It is also used in functional data for both purposes of data reduction as well as variation representation. On the other hand "handwriting" is one of the objects, studied in various statistical fields like pattern recognition and shape analysis. Considering time as the argument,...

متن کامل

Analysis of physiochemical and microbial quality of waters of the Karkheh River in southwestern Iran using multivariate statistical methods

Rapid population growth as well as agricultural and industrial development have increased the contamination of Iranian rivers. This study utilized principal components analysis (PCA) to determine the degree of significance of qualitative parameters of water resources in the Karkheh River in southwestern Iran. Cluster analysis (CA) grouped the monitoring stations based on the water quality data ...

متن کامل

Variable Selection and Principal Component Analysis

In most of applied disciplines, many variables are sometimes measured on each individual, which result a huge data set consisting of large number of variables, say p [Sharma (1996)]. Using this collected data set in any statistical analysis may cause several troubles. The dimensionality of the data set can often be reduced, without disturbing the main features of the whole data set by Principal...

متن کامل

Feature selection using genetic algorithm for classification of schizophrenia using fMRI data

In this paper we propose a new method for classification of subjects into schizophrenia and control groups using functional magnetic resonance imaging (fMRI) data. In the preprocessing step, the number of fMRI time points is reduced using principal component analysis (PCA). Then, independent component analysis (ICA) is used for further data analysis. It estimates independent components (ICs) of...

متن کامل

Two-stage Variable Clustering for Large Data Sets

In data mining, principal component analysis is a popular dimension reduction technique. It also provides a good remedy for the multicollinearity problem, but its interpretation of input space is not as good. To overcome the interpretation problem, principal components (cluster components) are obtained through variable clustering, which was implemented with PROC VARCLUS. The procedure uses obli...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999